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Abstract 

In this paper, we address the problem of generating preferred 
plans by combining the procedural control knowledge speci- 
fied by Hierarchical Task Networks (HTNs) with rich qualita- 
tive user preferences. The outcome of our work is a language 
for specifying user preferences, tailored to HTN planning, 
together with a provably optimal preference-based planner, 
HTNPREF, that is implemented as an extension of SHOP2. 
To compute preferred plans, we propose an approach based 
on forward-chaining heuristic search. Our heuristic uses 
an admissible evaluation function measuring the satisfaction 
of preferences over partial plans. Our empirical evaluation 
demonstrates the effectiveness of our HTNPREF heuristics. 
We prove our approach sound and optimal with respect to the 
plans it generates by appealing to a situation calculus seman- 
tics of our preference language and of HTN planning. While 
our implementation builds on SHOP2, the language and tech- 
niques proposed here are relevant to a broad range of HTN 
planners. 



1 Introduction 

Hierarchical Task Network (HTN) planning is a popular 
and widely used planning paradigm, and many domain- 
independent HTN pl anners exist (e.g., SHOP2, SIPE -2, I- 
X/I-PLAN, O-PLAN) (Gh allab, Nau, and Traverso 2004 ). In 
HTN planning, the planner is provided with a set of tasks 
to be performed, possibly together with constraints on those 
tasks. A plan is then formulated by repeatedly decomposing 
tasks into smaller and smaller subtasks until primitive, exe- 
cutable tasks are reached. A primary reason behind HTN's 
success is that its task networks capture useful procedu- 
ral control knowledge — advice on how to perform a task — 
described in terms of a decomposition of subtasks. Such 
control knowledge can significantly reduce the search space 
for a plan while also ensuring that plans follow one of the 
stipulated courses of action. However, while HTNs specify 
a family of satisfactory plans, they are, for the most part, 
unable to distinguish what constitutes a high-quality plan. 

In this paper, we address the problem of generating pre- 
ferred plans by augmenting HTN planning problems with 
rich qualitative user preferences. User preferences can be 
arbitrarily complex, often involving combinations of condi- 
tional, interacting, and mutually exclusive preferences that 
can range over multiple states of a plan. This makes finding 



an optimal plan hard. There are two aspects to addressing 
the problem of preference-based planning with HTNs. The 
first is to propose a preference specification language that is 
tailored to HTN planning. The second, is to generate pre- 
ferred, and ideally optimal, plans efficiently. 

To specify user preferences, we augment a rich 
qualitative preference language, CVV, proposed in 
( |Bienvenu, Fritz, and Mcl lraith 2006) with HTN-specific 
constructs. CVV specifies preferences in a variant of linear 
temporal logic (LTL). Among the HTN-specific properties 
that we add to our language, CVH, is the ability to express 
preferences over how tasks in our HTN are decomposed into 
subtasks, preferences over the parameterizations of decom- 
posed tasks, and a variety of temporal and nontemporal pref- 
erences over the task networks themselves. 

To compute preferred plans, we propose an approach 
based on forward-chaining heuristic search. Key to our ap- 
proach is a means of evaluating the (partial) satisfaction of 
preferences during HTN plan generation based on progres- 
sion. The optimistic evaluation of preferences yields an ad- 
missible evaluation function which we use to guide search. 
We implemented our planner, HTNPREF, as an extension to 
the SHOp2 HTN planner. Our empirical evaluation demon- 
strates the effectiveness of HTNPREF heuristics in finding 
high-quality plans. We provide a semantics for our pref- 
erence language in the situation calculus jReiter 20011 ) and 
appeal to this semantics to prove the soundness and optimal- 
ity of our planner with respect to the plans it generates. This 
paper omits a number of technical details that can be found 
in a longer paper describing this work. 

2 HTN Planning 

In this section, we provide a brief overview of both HTN 
planning, following ( Ghallab, Nau, and Traverso 2004) , and 
our situation calculus encoding of preference-based HTN 
planning. 

Travel Example: Consider a simple HTN planning prob- 
lem to address the task of arranging travel. This task can 
be decomposed into arranging transportation, accommoda- 
tions, and local transportation. Each of these tasks can again 
be decomposed based on alternate modes of transportation 
and accommodations, reducing eventually to primitive ac- 
tions that can be executed in the world. Further constraints 
can be imposed to restrict decompositions. 



Definition 1 (HTN Planning Problem) An HTN planning prob- 
lem is a 3-tuple V = {so,w,D) where sq is the initial state, w 
is a task network called the initial task network, and D is the HTN 
planning domain. V is a total-order planning problem ifw and D 
are totally ordered; otherwise it is said to be partially ordered. 

A task consists of a task symbol and a list of arguments. 
A task is primitive if its task symbol is an operator name and 
its parameters match, otherwise it is nonprimitive. In our 
example, arrange-trans and arrange-acc are nonprimitive 
tasks, while book-flight and book-car are primitive tasks. 

Definition 2 (Task Network) A task network is a pair w=(U, C) 
where U is a set of task nodes and C is a set of constraints. Each 
task node u £ U contains a task tu. If all of the tasks are ground 
then w is ground; If all of the tasks are primitive, then w is called 
primitive; otherwise is called nonprimitive. Task network w is to- 
tally ordered ifC defines a total ordering of the nodes in U. 

In our example, we could have a task network (J7, C) 
where U = U2}, wi =book-car, and U2= pay, and C is 
a precedence constraint such that ui must occur before U2 
and a before-constraint such that at least one car is available 
for rent before ui. 

A domain is a pair D = (O, M) where O is a set of op- 
erators and AI is a set of methods. Operators are essentially 
primitive actions that can be executed in the world. They 
are described by a triple o =(name(o), pre(o), eff(o)), corre- 
sponding to the operator's name, preconditions and effects. 
Preconditions are restricted to a set of literals, and effects 
are described as STRIPS-like Add and Delete lists. An op- 
erator can accomplish a ground primitive task in a state s 
if their names match and o is applicable in s. In our exam- 
ple, ignoring the parameters, operators might include: pay, 
book-train, book-car, book-hotel, and book-flight. 

A method, m, is a 4- tuple (name(tn), task(m},subtasks(m), 
constr(m)) corresponding to the method's name, a nonprim- 
itive task and the method's task network, comprising sub- 
tasks and constraints. A method is totally ordered if its task 
network is totally ordered. A domain is a total-order domain 
if every m £ M is totally ordered. Method m is relevant for 
a task t if there is a substitution cr such that a{t) =task{m). 
Several different methods can be relevant to a particular non- 
primitive task t, leading to different decompositions of t. In 
our example, the method with name by -flight-trans can be 
used to decompose the task arrange-trans into the subtasks 
of booking a flight and paying, with the constraint iconstr) 
that the booking precede payment. 

Definition 3 (Solution to HTN Planning Problem) Given HTN 
planning problem V ~ (so, ^, D), a plan tt = (oi, Ok) is a 
solution for V, depending on these two cases: I ) if w is primitive, 
then there must exist a ground instance of (U ,C } of (U, C) and 
a total ordering {ui, ...,Uk) of the nodes in U' such that for all 
1 < i < k, name( Oi) = 1^, the plan tt is executable in the state So, 
and all the constrains hold, 2)ifw is nonprimitive, then there must 
exist a sequence of task decompositions that can be applied to w to 
produce a primitive task network w', where n is a solution for w'. 

Finally, we define the HTN preference-based planning 
problem. This definition appeals to two concepts that are 
not yet well-defined and which we defer to later sections: 
definitions of the form and content of the the formula ^htn 
that captures user preferences for HTN planning as well as 



and the precise definition of more preferred appears in Sec- 
tion 3. 

Definition 4 (Preference-based HTN Planning) An HTN plan- 
ning problem with user preferences is described as a 4-tuple V = 
(so, w, D, $htn) where $htn is a formula describing user prefer- 
ences. A plan TV is a solution to V if and only if: tt is a plan for 
V — {so, w, D) and there does not exists a plan TV such that tv is 
more preferred than tv with respect to the preference formula ^htn- 

2.1 Situation Calculus Specification of HTN 

We now have a definition of preference-based HTN plan- 
ning. Later in the paper, we propose an approach to comput- 
ing preferred plans, together with a description of our im- 
plementation. To prove the correctness and optimality of our 
algorithm, we appeal to an existing situation calculus encod- 
ing of HTN planning, which we augment and extend to pro- 
vide an encoding of preference-based HTN planning. Since 
the situation calculus has a well-defined semantics, we have 
a semantics for our encoding which we use in our proofs. In 
this section, we review the salient features of this encoding. 

The Situation Calculus is a logical language for speci- 
fying and reasoning about dynamical systems (Reiter 200 11 1. 
In the situation calculus, the state of the world is expressed 
in terms of functions and relations (fluents) relativized to a 
particular i/fMflf/on s, e.g., F{x, s). A situation s is a history 
of the primitive actions, a £ A, performed from a distin- 
guished initial situation So. The function do{a,s) maps a 
situation and an action into a new situation thus inducing a 
tree of situations rooted in 5*0. A basic action theory in the 
situation calculus T) includes domain independent founda- 
tional axioms, and domain dependent axioms. A situation s' 
precedes a situation s, i.e., s' □ s, means that the sequence 
s' is a proper prefix of sequence s. 

Golog (Reiter 2001) is a high-level logic programming 
language for the specification and execution of complex 
actions in dynamical domains. It builds on top of the 
situation calculus by providing Algol-inspired extralogi- 
cal constructs for assembling primitive situation calcu- 
lus actions into complex actions {programs) 5. Exam- 
ple complex actions include action sequences, if-then-else, 
while loops, nondeterministic choice of actions and ac- 
tion arguments, and procedures. These complex actions 
serve as constraints upon the situation tree. ConGolog 
( De Giacomo, Lesperance, and Levesque 2000) is the con- 
current version of Golog in which the language can addition- 
ally deal with execution of concurrent processes, interrupts, 
prioritized concurrency, and exogenous actions. 

A number of researchers have pointed out the connec- 
tion between HTN and ConGolog. Following Gabaldon 
jGabaldon 2002l l. we map an HTN state to a situation cal- 
culus situation. Consequently, the initial HTN state sq is 
encoded as the initial situation, So. The HTN domain de- 
scription maps to a corresponding situation calculus domain 
description, V, where for every operator o there is a corre- 
sponding primitive action a, such that the preconditions and 
the effects of are axiomatized in V. Every method and 
nonprimitive task together with constraints is encoded as a 
ConGolog procedure. For the purposes of this paper, the set 



of procedures in a ConGolog domain theory is referred to as 

n. 

We us e a predicate badSituation{s) proposed by Reiter 
( IReite r 2001 ) to encode the constraints in a task network. 
The purpose of these constraints is to prune part of a search 
space similar to using temporal constraints. 

To deal with partially ordered task networks, we add 
two new primitive actions start{P{v)), end{P{v)), and two 
new fluents executing{P{v), s) and terminated{X, s), where 
P{v) is a ConGolog procedure and X is either P{v) or an 
action a E A. executing{P{v), s) states that P{v) is exe- 
cuting in situation s, tenninated{X, s) states that X has ter- 
minated in ,s. executing{a, s) where a G ^ is defined to be 
false. The successor state axioms for these fluents follow. 
They show how the actions start{P{v)), end{P{v)) change 
the truth value of these fluents: 

executing {P{v), do{a, s)) = a — start{P{v))\J 

executing{P(v), s) A a ^ end(P{v)) 
terminated{X , do{a, s)) = X — aV 

(X € 7?, A a = end{X)) V tenninated{X , s) 
where TZ is the set of ConGolog procedures in our domain. 

Definition 5 (Preference-based HTN in Situation Calculus) 

An HTN planning problem with user preferences described as a 
4-tuple V = (so, Z), $htn) is encoded in situation calculus as 
a 5-tuple (T),C, A,5o,^sc) where D is the basic action theory, 
C is the set of ConGolog axioms, A is the sequence of procedure 
declarations for all ConGolog procedures in TZ, So is an encoding 
of the initial task network in ConGolog, and $sc is a mapping of 
the preference formula ^htn in situation calculus. A plan a is a 
solution to the encoded preference-based HTN problem if and only 
if- 

© U C ^ (3s)Do(A;(5o,S'o,s) A s = do{a,So) 
A -^badSituation{s) A $s' .[Do{A; So, So, s') 
A -^badSituation{s') A pref{s', s, ^sc)] 

where pref{s', s, $sc) denotes that the situation s' is pre- 
ferred to situation s with respect to the preference formula 
^sc, and Do{5,So,do{a,So)) denotes that the ConGolog 
program S, starting execution in So will legally terminate 
in situation do{a,So)- Removing all the start{P{v)) and 
end{P{v)) actions from a to obtain b = (&i, 6„), a pre- 
ferred plan for the original HTN planning problem 'P is a 
plan TT = (oi, o„) where for all 1 < i < n, name(oi)= bi. 

3 HTN Preference Specification 

In this section, we describe how to specify the preference 
formula ^htn- Our preference language, CPU., modifies 
and extends the CVV quahtative preference language pro- 
posed in (Bie nvenu, Fritz, and Mcl lraith 2 006 ) to capture 
HTN-specific preferences. 

Our CVH language has the ability to express preferences 
over certain parameterization of a task (e.g., preferring one 
task grounding to another), over a certain decomposition of 
nonprimitive tasks (i.e., prefer to apply a certain method 
over another), and a soft version of the before, after, and in 
between constraints. A soft constraint is defined via a pref- 
erence formula whose evaluation determines when a plan is 
more preferred than another. However, unlike the task net- 
work constraints which will prune or ehminate those plans 



that have not satisfied them, not meeting a soft constraint 
simplify deems a plan to be of poorer quality. 

Definition 6 (Basic Desire Formula (BDF)) A basic desire 
formula is a sentence drawn from the smallest set B where: 

1. If I is a literal, then I G B andfinal{l) G B 

2. 1ft is a task, then occ{t) G B 

3. Ifm is a method, and n = name{m), then apply [n) G B 

4. Ifti, and t2 are tasks, and I is a literal, then 
before {ti,t2),holdBefore{ti, l),holdAfier[t\, I), 
holdBetween(t\, 1, 12) are in B. 

5. Ifipi and if 2 are in B, then so are -^ipi, ipi A ip2, ^pi V ip2, 
{3x)ifi, {\fx)ifi, next(Lpi), always(ipi), eventually (ipi), 
and until( ipi, ip2). 

final(0 states that the literal / holds in the final state, occ(t) 
states that the task t occurs in the present state, and next(iy9i), 
always(</3i), eventually(<^i), and until(iy3i, 922) are basic LTL 
constructs, apply(n) states that a method whose name is n 
is applied to decompose a nonprimitive task. before(ii,t2) 
states a precedence ordering between two tasks. holdBe- 
fore(ti,0, lioldAfter(ti, 0, holdBetween(ti, Z, t2) state a soft 
constraint over when the fluent I is preferred to hold, (i.e., 
holdBefore(ti, state that I must be true right before the last 
operator descender of ti occurs). Combining occ(i) with 
the rest of CPH language enables the construction of pref- 
erence statements over parameterizations of tasks. 

BDFs establish properties of different states within a plan. 
By combining BDFs using boolean and temporal connec- 
tives, we are able to express other properties of state. The 



following are a few examples from our travel domairQ. 

{3c) .occ {book-car{c. Enterprise)) (PI) 

apply' {by-car-local(SUV, Avis)) (P2) 

betore{arrange-trans , arrange-acc) (P3) 

holdBe{ore{hotelReservation, arrange -trans) (P4) 

always (occ' {pay {Master card) ) ) ) (P5) 

{3h, r). occ {book-hotel{h, r)) A starsGE{r, 3) (P6) 
{3c). occ {book-flight{c. Economy, Direct, WindowSeat)) 

A member{c, StarAlliance) (P7) 



IFTI states that at some point the user books a car with 
Enterprise. |P2] states that at some point, the by-car-local 
method is applied to book an SUV from Avis. |P3]states that 
the arrange-trans task occurs before the arrange-acc task. 
|P4]states that the hotel is reserved before transportation is ar- 
ranged. |P5]states that the user never pays by Mastercard. |P6] 
states that at some point the user books a hotel that has a rat- 
ing of 3 or more. |P7] states that at some point the user books 
a direct economy window-seated flight with a Star Alliance 
carrier 

To define a preference ordering over alternative properties 
of states. Atomic Preference Formulae (APFs) are defined. 
Each alternative comprises two components: the property 
of the state, specified by a BDF, and a value term which 
stipulates the relative strength of the preference. 

'To simplify the examples many parameters have been sup- 
pressed, and we abbreviate eventually(occ(</:;)) by occ', eventu- 
ally(apply(^)) by apply' and refer to preferences by their labels. 



Definition 7 (Atomic Preference Formula (APF)) 

Let V be a totally ordered set with minimal element Vmin and max- 
imal element Vmax- An atomic preference formula is a formula 
fol^o] 3> 1^1 [ui] 2> ... 2> i^niun], where each ipi is a BDF, each 
Vi G V, <C Vj for i <^ j, and vq = Vmin- When n = 0, atomic 
preference formulae correspond to BDFs. 

While one could let V = [0, 1], you could choose a strictly 
qualitative set like {best < good < indifferent < bad < 
worst} to express preferences over alternatives. 

Now here are a few APF examples from the travel domain. 

|P2lO] > apply' {by-car-local(SUV, National)) [0.3] (P8) 
apply' {by-car-trans)[0] > apply' (fcy-^ig/if) [0.4] (P9) 
OCC {book-train)[0] > occ' (fcoo/t-car) [0.4] (PIO) 

|P8]states that the user prefers that the by-car-local method 
rents an SUV and that the rental car company Avis is pre- 
ferred to National. |P9] states that the user prefers to de- 
compose the arrange-trans task by the method by-car-trans 
rather than the by-flight method. Note that the task is im- 
plicit in the definition of the method. IPlOl states that the user 
prefers travelling by train over renting a car 

To allow the user to specify more complex preferences 
and to aggregate preferences. General Preference Formulae 
(GPFs) extend the language to conditional, conjunctive, and 
disjunctive preferences. 

Definition 8 (General Preference Formula (GPF)) 

A formula $ is a GPF if one of the following holds: 

• $ is an APF 

• $ w 7 ; ^l/, where 7 is a BDF and 'I' is a GPF [ Conditional] 

• "!> is one of ^0 & ^1 & ... & '^n [General Conjunction] 

or'^o I 'I'l I ... I n [General Disjunction] 
where n > 1 and each V&i is a GPF. 

General conjunction (resp. general disjunction) refines the 
ordering defined by & *i & ... & *n (resp. *o|*i|...|*n) 
by sorting indistinguishable states using the lexicograping 
ordering. Continuing our example: 

OCc{arrange-trans) : {3c) .OCC {book-car{c, Avis)) (Pll) 

occ{arrange-local-trans) I Pll (P12) 

drivable : |PTOl O] > occ {book-flight)[O.S] (P13) 

|P4lfc|P6lfc|P7lfc|P8]fc|P9lfc|PT0lfc|PT2lfc|PT3] (P14) 



IPI II states that if inter-city transportation is being ar- 
ranged then the user prefers to rent a car from Avis. IP12I 
states that if local transportation is being arranged the user 
prefers Enterprise. IP13l states that if the distance between the 
origin and the destination is drivable then the user prefers to 
book a train over booking a car over booking a flight. IP14I 
aggregates preferences into one formula. 

Again, and only for the purpose of proving properties, we 
provide an encoding of the HTN-specific terms of CPH in 
the situation calculus. As such, for any preference formula 
^htn there is a corresponding formula where every 
HTN-specific term is replaced as follows: each literal I is 
mapped to a fluent or non-fluent relation in the situation cal- 
culus, as appropriate; each primitive task t is mapped to an 
action a E A; and each nonprimitive task t and each method 
TO is mapped to a procedure P{v) G 7?. in ConGolog. 



occ(X) [s',s] = 



3.1 The Semantics 

The semantics of CPH is achieved through assigning a 
weight to a situation s with respect to a GPF, $, written 
This weight is a composition of its constituents. For 
BDFs, a situation s is assigned the value Vmin if the BDF is 
satisfied in s, Vmax otherwise. Similarly, given an APF, and 
a situation s, s is assigned the weight of the best BDF that it 
satisfies within the defined APF. Finally GPF semantics fol- 
low the natural semantics of boolean connectives. As such 
General Conjunction yields the minimum of its constituent 
GPF weights and General Disjunction yields the maximum. 

Similar to dGabaldon 20041 ) and following CPP, we use 
the notation </3[s', s] to denote that if holds in the sequence 
of situations starting from s' and terminating in s. Next, we 
will show how to interpret BDFs in the situation calculus. 

If / is a fluent, we will write /[s',s] = f[s'] since flu- 
ents are represented in situation-suppressed form. If r is 
a non-fluent, we will have r[s', s] = r since r is already 
a situation calculus formula. Furthermore, we will write 
final(/)[s', s] = /[s] since final(/) means that the fluent 
/ must hold in the final situation. 

The BDF occ(X) states the occurrence of X which can 
be either an action or a procedure, written as: 

do{X,s')\zs ifXeA 

do{start{X), s') C s if X G 7^ 

The BDF apply(P(t7)) will be interpreted as follows: 
apply(P(iT)) [s',s] = do{start{P{v)),s') C s 

Boolean connectives and quantifiers are already part of the 
situation calculus and require no further explanation here. 
The LTL constructs are interpreted in the same way as in 
dGabaldon 2004! ) . We interpret the rest of the connectives as 
follows 0. 

before(Xi, X2)[s', s] = (3si,S2 : s' C si E S2 E s) 
{terminated{Xi)[si] A -^executing(X2)[si] 
A -^terminated{X2)[si] A occ(X2)[s2, s]} 

holdBefore(X, /)[s',s] = (3si : s' C si C s) 
{/[si] Aocc(X)[si,s]} 

holdAfter(X, /)[s',s] = (3si : s' C si C s) 
{terminated{X)[si] A /[si]} 

holdBetween(Xi,/,X2)[s',s] = 
(3si, S2 : s' C si C S2 C s) 
{terminated{Xi)[si] A -^executing{X2)[si] 
A -^terminated{X2)[si] A occ(X2)[s2, s]} 
A (Vsi : Si C Si IZ S2)f[si] 
From here, the semantics follows that of CPP. 

Definition 9 (Basic Desire Satisfaction) Let V be an action the- 
ory, and let s' and s be situations such that s' C s. The situations 
beginning in s' and terminating in s satisfy ip just in the case that 
P \= fls', s]. We define Wsi^s{ip) to be the weight of the situations 
originating in s' and ending in s wrt BDF (p. Ws'^sif) ~ Vmin if 
if> is satisfied, otherwise Ws\s{v) ~ Vmax- 

Note that for readability we are going to drop s' from the 
index, i.e., Ws{(p) = Ws' .s{f) in the special case of s' — Sq. 



We use the following abbreviations: 

(3si : s' C si C s)$ = (3si){s' C si A si [Z s A $} 
(Vsi : s' C si C s)$ = (Vsi){[s' C si A si C s] C 



Definition 10 (Atomic Preference Satisfaction) Let s be a situ- 
ation and $ — ^po[vii\ 3> y^iifi] 2> ... 'S> '4^n[vn\ be an 
atomic preference formula. Then Ws{'^) = Vi if i = min j{V \= 
tpj [So , s] }, and uis ($) = Vmax if no such i exists. 

Definition 11 (General Preference Satisfaction) Let s be a situ- 
ation and ^ be a general preference formula. Then Ws ($) is de- 
fined as follows: 

• Ws[ipo ^ ipi ^ ... ^ 'Pn) is defined above 

^min If ^ s (t) '^max 

Ws ( ^ ) Otherwise 
Ws(*o&*i&---&*n) =max{iys(3'0 : 1 < i < n} 
Ws(*o I ^Pi I ... I * 



The following definition dictates how to compare two sit- 
uations (and thus two plans) with respect to a GPF. This 
preference relation pref is used to compare HTN plans in 
Definition |5] and provides the semantics for more preferred 
in Definition |4] 

Definition 12 (Preferred Situations) A situation s\ is at least as 
preferred as a situation S2 with respect to a GPF $, written 
pre/(si,S2,4>) ifws^i'b) < Ws.,{^). 

4 Computing Preferred Plan 

To compute a preferred plan, we proposed a heuristic- 
search, forwarding-chaining planner that searchs for the 
most preferred terminating state that satisfies the HTN plan- 
ning problem. The search is guided by an admissible eval- 
uation function that evaluates partial plans with respect to 
preference satisfaction. We use progression to evaluate the 
preference formula satisfaction over partial plans. 

4.1 Progression 

Given a situation and a temporal formula, progression eval- 
uates it with respect to the state of a situation to gener- 
ate a new formula representing those aspects of the for- 
mula that remain to be satisfied. In this section, we define 
the progression of the constructs we added/modified from 
CVV and show that progression preserves the semantics of 
reference formulae. To define the p rogression, similar to 
jBienv enu, Frit z, and Mcllraith 200 6) we add the proposi- 
tional constants TRUE and FALSE to both the situation calcu- 
lus and to our set of BDFs, where V 1= TRUE and V ^ FALSE 
for every action theory T). We also add the BDF occNext(X), 
and applyNext(P(i;)) to capture the progression of occ(X) 
and apply(P(?T)). Below we show the progression of the 
added constructs. 

Definition 13 (Progression) Let s be a situation, and let ip be a 
BDF. The progression of ip through s, written pa{p), is given by: 

• If ip=occ(X) then 

Ps{<p) = occNext{X) f\eventually{terminated{X)) 

• Ifip= occNext(X) , then 

{TRUE if X G AAV \= 3s'. s = do{X, s') 
TRUE if X £TZ A V \= 3s'. s = do(start(X), s') 
FALSE otherwise 

• Ififi = apply {P{v)), then 

Psif) — applyNext{P{v)) /\eventually(terminated(P[v))) 

• Ifip= applyNext{P{v)) , then 

( \ -j if V \=3s'.s = do{start{P{v)),s') 

Ps{f) - I fALSE otherwise 

• Ifp = before {Xi,X2), holdBefore{X, /), holdAfter{X , /), 



or hoIdBetween (Xi , /, X2), then 

TRUE if Ws (p) = Vr, 
FALSE otherwise 



ps{ip) = 



To see how the other constructs are prog ressed please re- 
fer to ( [Bienvenu, Fritz, and Mcllraith 2006] l. 

4.2 Admissible Evaluation Function 

In this section, we describe an admissible evaluation func- 
tion using the notion of optimistic and pessimistic weights 
that provide a bound on the best and worst weights of any 
successor situation with respect to a GPF $. Optimistic 
(resp. pessimistic) weights, wj^*(<l>) (resp. wf^*'°($)) are 
defined based on optimistic (resp. pessimistic) satisfaction 
of BDFs. Optimistic satisfaction (iy9[s', s]"^*) assumes that 
any parts of the BDF not yet falsified will eventually be 
satisfied. Pessimistic satisfaction (sp[s' , s]P'^^*) assumes 
the opposite. The following definitions highlight the 
key differences between this work and the definitions in 
( [Bienvenu, Fritz, and Mcllraith 2006| l. 

occ(X)[s', s 



opt ' 



occ(X)[s',s]'"='''' = 



do{X,s')^sy s' = s ifXeA 
do{start{X),s') C s V s' = s if X e 7^ 

do(X,s')cs ifxe^ 

do{start{X),s') C s if X e 7^ 



apply(P(u))[s', s]""* = do{start{P{v)), s') E s V s' = s 

apply(P(u))[s', sf^"" do(start(P{v)), s') C s 
lfip = before(Xi,X2),holdBefore(X,/),holdAfter(X,/) 
lioldBetween(Xi, /, X2), then 



]opt 



= Pis 



dof 



s{p) 



Tlieorem 1 Let s„ = do{[ai, a„], So),n > be a collection 
of situations, (p be a BDF, $ a general preference formula, and 



be the optimistic and pessimistic weights of^ 



with respect to s. Then for any 0<i<j<k<n, 



Theorem [T] states that the optimistic weight is non- 
decreasing and never over-estimates the real weight. Thus, 
/$ is admissible and when used in best-first search, the 
search is optimal. 

Definition 14 (Evaluation function) Let s — do{a, So) be a 
situation and let $ be a general preference formula. Then 

Ws{^) if a is a plan, otherwise =^ wj^*($). 

5 Implementation and Results 

In this section, we describe our best-first search, ordered- 
task-decomposition planner. Figure[T]outlines the algorithm. 
HTNPREF takes as input V — {sq, w, D,pref) where sq is 
the initial state, w the initial task network, D is the HTN 
planning domain, and pref the general preference formula, 
and returns a sequence of ground primitive operators, i.e. a 
plan, and the weight of that plan. 

The frontier is a list of nodes of the form [optW, pessW, 
w, partial?, s, pref], sorted by optimistic weight, pessimistic 
weight, and then by plan length. The frontier is initialized to 
the initial task network w, the empty partial plan, its optW, 



HTNPREF(so, w, D, pref) 

frontier ^ INITFrONTIER(so, w, pref) 

while frontier 7^ 

current ^ REMOVEFlRST(/ronfter) 

% establishes values of w, partialP, s, progPref 

if w= and optW=pessW then return partialP, optW 

neighbours ^ EXPAND(m', D, partialP, s, progPref) 

frontier ^ SORT'NMERGE{neighbours, frontier) 

return [], 00 



Figure 1 : A sketch of the HTNPREF algorithm. 



pessW, and pre/ corresponding to the progression and evalua- 
tion of the input preference formula in the initial state. 

On each iteration of the while loop, HTNPREF removes 
the first node from the frontier and places it in current. If 
w is empty (i.e., U is an empty set), the situation associated 
with this node is a terminating situation. Then HTNPREF re- 
turns currents, partial plan and weight. Otherwise, it calls 
the function EXPAND with current's node as input. 

EXPAND returns a new list of nodes that need to be 
added to the frontier. The new nodes are sorted by optW, 
pessW, and merged with the remainder of the frontier If 
w is nil then the frontier is left as is. Otherwise, it 
generates a new set of nodes of the form [optW, pessW, 
newW, newPartialP, newS, newProgPref], one for each le- 
gal ground operator that can be reached by performing 
w using a partial-order forward decomposition procedure 
(PFD) (Ghallab, Nau, and Traverso 2004). Currently HT- 
NPREF uses SHOP2 ( Nau et al. 20 03 ) as its PFD. Hence, the 
current implementation of HTNPREF is an implementation 
of SHOp2 with user preferences. For each primitive task 
leading to terminating states, EXPAND generates a node of 
the same form but with optW and pessW replaced by the ac- 
tual weight. If we reach the empty frontier, we return the 
empty plan. 

Theorem 2 (Soundness and OptimaUty) 

Let 'P={so,w, D, be a HTN planning problem with user pref- 
erences. Let TT be the plan returned by HTNPREF from input V. 
Then tv is a solution to the preference based HTN problem V 

Proof sketch: We prove that the algorithm terminates appeal- 
ing to the fact that the PFD procedure is sound and complete. 
We prove that the returned plan is optimal, by exploiting the 
correctness of progression of preference formula, and ad- 
missibility of our evaluation function. 

5.1 Experiments 

We implemented our preference-based HTN planner, 
HTNPR EF, on top of the LISP implementation of 
SHOP2 d Nau et al. 2003l l. All experiments were run on a 
Pentium 4 HT, 3GHZ CPU, and 1 GB RAM, with a time 
limit of 900 seconds. Since the optimaHty of HTNPREF- 
generated plans was established in Theorem|2l our objective 
was to evaluate the effectiveness of our heuristics in guid- 
ing search towards the optimal plan, and to establish bench- 
marks for future study, since none currently exist. 

We tested HTNPREF with ZenoTravel and Logistics do- 
mains, which were adapted from the International Planning 
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(a) ZenoTravel domain 
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(b) Logistic domain 



Figure 2: Our criteria for comparisons are number of Nodes Ex- 
panded (NE), number of applied operators; number of Nodes Con- 
sidered (NC), the number of nodes that were added to the frontier, 
and time measured in seconds. Note NC is equal to NE for SHOP2. 
PL is the Plan Length and # Plan is the total number of plans. 

Competition (IPC). The ZenoTravel domain involves trans- 
porting people on aircrafts that can fly at two alternative 
speeds between locations. The Logistics domain involves 
transporting packages to different destinations using trucks 
for delivery within cities and planes for between cities. 

In order to evaluate the effectiveness of HTNPREF it 
would have been appealing to evaluate our planner with a 
preference-based planner that also makes use of procedural 
control knowledge. But since no comparable planner exists, 
and it would not have been fair to compare HTNPREF with 
a preference-based planner that does not use control knowl- 
edge, we compared HTNPREF with SHOP2, using a brute- 
force technique for SHOP2 to determine the optimal plan. In 
particular, as is often done with Markov Decision Processes, 
SHOP2 generated all plans that satisfied the HTN specifica- 
tion and then evaluated each to find the optimal plan. Note 
that the times reported for SHOP2 do not actually include 
the time for posthoc preference evaluation, so they are lower 
bounds on the time to compute the optimal plan. 

Figure |2] reports our experimental results for ZenoTravel 
and the Logistics domain. The problems varied in prefer- 
ence difficulty and are shown in the order of difficulty with 
respect to number of possible plans (# Plan) that satisfy the 



HTN control. 

The results show that, in all but the first case of each do- 
main, SHOp2 required more time to find the optimal plan, 
and expanded more nodes. In particular note that in prob- 
lems 1 1 and 12 SHOP2 ran out of time (900 seconds) while 
HTNPREF found the optimal plan well within the time limit. 
Also note that HTNPREF expands far fewer nodes in com- 
parison to SHOP2, illustrating the effectiveness of our eval- 
uation function in guiding search. 

6 Summary and Related Work 

In this paper, we addressed the problem of generating pre- 
ferred plans by combining the procedural control knowl- 
edge of HTNs with rich qualitative user preferences. The 
most significant contributions of this paper include: CPH, 
a rich HTN-tailored preference specification language, de- 
veloped as an extension of a previously existing language; 
an approach to (preference-based) HTN planning based on 
forward-chaining heuristic search, that exploits progression 
to evaluate the satisfaction of preferences during planning; 
a sound and optimal implementation of an ordered-task- 
decomposition preference-based HTN planner; and leverag- 
ing previous research, an encoding of HTN planning with 
preferences in the situation calculus, that enabled us to prove 
our theoretical results. While the implementation we present 
here exploits SHOP2, the language and techniques proposed 
are relevant to a broad range of HTN planners. 

In previous work, we addressed the problem of in- 
tegrating user preferences into Web service composition 
(TSohrabi, Prokoshyna, and Mcllraith 2006)1. To that end, we 
developed a Golog-based composition engine that also ex- 
ploits heuristic search. It similarly uses an optimistic heuris- 
tic. The language used in that work was CW and had no 
Web-service or Golog-specific extensions for complex ac- 
tions. This paper's HTN-tailored language and HTN-based 
planner are significantly different. 

Preference-based planning has been the subject of much 
interest in the last few years, spurred on by an International 
Planning Competition (IPC) track on this subject. A num- 
ber of planners were developed, all based on the the com- 
petition's PDDL3 language ( [Gerevini and Long 2005 1. Our 
work is distinguished in that it exploits procedural (action- 
centric) domain control knowledge in the form of an HTN, 
and action-centric and state-centric preferences in the form 
of CVTL. In contrast, the preferences and domain control 
in PDDL3 and its variants are strictly state-centric. Further, 
CVH is qualitative whereas PDDL3 is quantitative, appeal- 
ing to a numeric objective function. We contend that qualita- 
tive, action- or task-centric preferences are often more com- 
pelling and easier to elicit that their PDDL3 counterparts. 

While no other HTN planner can perform true 
preference-based planning, SHOP2 JNau et al. 2003) and 
ENQUIRER (.Kuter et al. 20041 ) handle some simple user 
constraints. In particular the order of methods and sorted 
preconditions in a domain description specifies a user pref- 
erence over which method is more preferred to decompose 
a task. Hence users may write different versions of a do- 
main description to specify simple preferences. However, 



unlike HTNPREF the user constraints are treated as hard con- 
straints and (partial) plans that do not meet these constraints 
will be pruned from the search space. Further, there is no 
way to handle temporally extended hard or soft constraints 
in SHOP2. We used progression in our approach to planning 
precisely to deal with these interesting preferences. Were we 
limiting the expressive power of preferences to SHOP2-like 
method ordering, we would have created a different planner. 
Interestingly, SHOP2 method ordering can still be exploited 
in our approach, but requires a mechanism that is beyond the 
scope of this paper. 

Finally, the ASPEN planner 

( Rabideau, Engelhardt, and Chien 2000) performs a 



simple form of preference-based planning, focused mainly 
on preferences over resources and with far less expressivity 
than CVH. Nevertheless, ASPEN has the ability to plan 
with HTN-like task decomposition, and as such, this work 
is related in spirit, though not in approach to our work. 
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